An integrated text mining framework for metabolic interaction network reconstruction

نویسندگان

  • Preecha Patumcharoenpol
  • Narumol Doungpan
  • Asawin Meechai
  • Bairong Shen
  • Jonathan H. Chan
  • Wanwipa Vongsangnak
چکیده

Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module-MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module-MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme-metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An integrated text mining system based on network analysis for knowledge discovery of human gene- disease associations (GenDisFinder)

We introduce an automated text mining tool named ‘GenDisFinder’ that aids in the extraction of human gene-disease associations from biomedical literature and further categorize them as three classes known, inferred or novel using network analysis. The main modules of GenDisFinder are named entity tagging of gene/protein and disease names, gene-disease relation extraction, gene-disease network c...

متن کامل

Event based text mining for integrated network construction

The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery ...

متن کامل

The Transformer database: biotransformation of xenobiotics

As the number of prescribed drugs is constantly rising, drug-drug interactions are an important issue. The simultaneous administration of several drugs can cause severe adverse effects based on interactions with the same metabolizing enzyme(s). The Transformer database (http://bioinformatics.charite.de/transformer) contains integrated information on the three phases of biotransformation (modifi...

متن کامل

PubRunner: A light-weight framework for updating text mining

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are...

متن کامل

Text mining for metabolic pathways, signaling cascades, and protein networks.

The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2016